Translation as problem solving: uses of comparable corpora
نویسنده
چکیده
The paper describes an approach that uses comparable corpora as tools for solving translation problems. First, we present several case studies for practical translation problems and their solutions using large comparable corpora for English and Russian. Then we generalise the results of these studies by outlining a practical methodology, which has been tested in the course of translation training.
منابع مشابه
استخراج پیکره موازی از اسناد قابلمقایسه برای بهبود کیفیت ترجمه در سیستمهای ترجمه ماشینی
Data used for training statistical machine translation method are usually prepared from three resources: parallel, non-parallel and comparable text corpora. Parallel corpora are an ideal resource for translation but due to lack of these kinds of texts, non-parallel and comparable corpora are used either for parallel text extraction. Most of existing methods for exploiting comparable corpora loo...
متن کاملAccurate Parallel Fragment Extraction from Quasi-Comparable Corpora using Alignment Model and Translation Lexicon
Although parallel sentences rarely exist in quasi–comparable corpora, there could be parallel fragments that are also helpful for statistical machine translation (SMT). Previous studies cannot accurately extract parallel fragments from quasi–comparable corpora. To solve this problem, we propose an accurate parallel fragment extraction system that uses an alignment model to locate the parallel f...
متن کاملTranslation Induction on Indian Language Corpora Using Translingual Themes from Other Languages
Identifying translations from comparable corpora is a wellknown problem with several applications, e.g. dictionary creation in resource-scarce languages. Scarcity of high quality corpora, especially in Indian languages, makes this problem hard, e.g. state-of-the-art techniques achieve a mean reciprocal rank (MRR) of 0.66 for English-Italian, and a mere 0.187 for Telugu-Kannada. There exist comp...
متن کاملAutomatic Building and Using Parallel Resources for SMT from Comparable Corpora
Building parallel resources for corpus based machine translation, especially Statistical Machine Translation (SMT), from comparable corpora has recently received wide attention in the field Machine Translation research. In this paper, we propose an automatic approach for extraction of parallel fragments from comparable corpora. The comparable corpora are collected from Wikipedia documents and t...
متن کاملCollecting Comparable Corpora from the Web
Statistical machine translation (SMT) relies on the availability of rich parallel corpora. However, in case of under-resourced languages, parallel corpora are not readily available. To overcome this problem previous work has recognized the potential of using comparable corpora as training data. A critical first problem with such an approach is actually identifying and gathering corpora with pot...
متن کامل